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I. INTRODUCTION 



The quality of speech output in adaptive aperture coding 1 has 
been improved by two refinements: (i) a code selection procedure 
based on formal error minimization rather than observations of aper- 
ture crossing, 1 and (ii) a simple adaptive low-pass filtering operation 
based on adjacent sample correlation values, measured on a short- 
term basis (typically, once every 20 to 30 ms). We describe these 
refinements with special reference to a 7-code aperture characteristic 
designed for an average output rate of 1.2 bits/sample, and speech 
inputs sampled at 8 and 12 kHz. At corresponding bit rates (9.6 to 14.4 
kb/s), adaptive aperture coding, in conjunction with a first-order 
adaptive predictor, constitutes a medium-complexity approach in time- 
domain coding, with an output speech quality that is less-than-toll but 
nevertheless useful in many applications. A natural application of 
aperture coding is for speech storage where variability of output bit 
rate is less objectionable than in transmission. 

Adaptive aperture coding is a medium-complexity approach to the 
digitization of slowly changing waveforms. In a recently described 1 
procedure, the idea was to form an aperture centered on the last 
encoded waveform sample and to avoid further encoding until the 
waveform crossed that aperture. The features of the system that made 
it applicable to low bit rate digitization of speech were three. The first 
feature was an arrangement that precluded the need for explicit 
encoding of aperture crossing times. The second feature was a syllabic 
adaptation algorithm for varying aperture width in view of the nonsta- 
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tionarity of the speech waveform. The third feature was the use of the 
aperture coder in a differential quantization mode, in conjunction with 
an adaptive first-order predictor. Reference 1 also addressed the vari- 
able-output-rate characteristics of aperture coding and suggested that 
a typical application of the procedure may be for voice storage where 
variable-rate characteristics would be less objectionable than in real- 
time communication. 

The purpose of this brief is to describe two refinements that have 
provided improvements in the quality of the speech output from an 
aperture coder: (c) a code selection procedure based on a formal error- 
minimization rule, rather than observations of aperture crossing as in 
Ref. 1, and (ii) use of a simple, time- varying, low-pass filter based only 
on short-term adjacent sample correlation, an item of information that 
is already available in adaptive first-order prediction. 

II. APERTURE CODING BASED ON APERTURE CROSSINGS 

Refer to the 7-point aperture- (or quantization-) characteristic of Fig. 
la. For the input waveform X shown in the figure, the output will be 
P3, signifying that an aperture crossing occurred prior to time 3. At 
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Fig. 1— Adaptive aperture coding based on (a) aperture crossing observations and (b) 
error-minimizing code selection. 
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the receiver (or decoder), P3 conveys two items of information: a 
timing information to the effect that the output Y will be updated at 
time 3, and an amplitude information in the sense that the updating 
magnitude must be positive and greater than that of P3 on the 
characteristic. For example, the updating magnitude can be merely 
that of the code point immediately preceding the transmitted code, 
and the shape of the aperture characteristic is optimized as in eq. (1) 
below to make the above amplitude convention appropriate from a 
quantization-noise viewpoint. 1 The code P3 also implies that the 
updating is zero at time 2, and the output approximation to the 
waveform X will therefore follow the dashed line of Fig. la. Output 
codes PI P2 Nl N2 N3 have corresponding interpretations for wave- 
form updating. For example, if Nl is transmitted, an appropriate 
negative update occurs at time 1, and the aperture coding procedure 
repeats with a new aperture characteristic beginning at time 2. We will 
therefore associate the code Nl (also PI) with a run length of 1. The 
code P3 as in Fig. la implies a run-length of 3 (also equal to the 
aperture length of 3). The zero code Z occurs if the waveform has not 
crossed the aperture even at time L. This will be arbitrarily referred to 
as a run length of (L + 1). When output Code Z is received, the Y- 
sequence is never updated in the course of a current aperture. 

It is desirable for an aperture characteristic to decay exponentially. 1 
The width of a single aperture characteristic at phase t is 

A (t) = Ao-2- JI , (1) 

where the initial aperture width An is adapted by cues derived from a 
history of the k most recent run-lengths R. Thus for the (r + l)st 
aperture, 

Air^-Qi'AP if (ADAPT) (r, = 

A'o r+u = Air + G, if (ADAPT) ir) = 1 
G, = l-e 2 ; € -»0 

A- 

(ADAPT) ,rl =1 if ZR r - k <K 

= otherwise 
Max[A r+]) ]=Aff AX . (2) 

Typically, for a 7-code characteristic (L = 3) with a 1.2-bit/sample 
average output rate, k = 3 and K = 7, 1 and appropriate values for J, 
Gi, G 2 and A™ AX are those summarized in Table II. With the above 
values of k, K', Gi and G 2 , a predominance of high run-length code 
words (for example, Z) will imply that A l r+u <A^ r) , while a predomi- 
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nance of low run-length code words (for example, PI or Nl) will imply 
thAt AlT" >Al> r) . 

Table I provides a numerical description of the example in Fig. la, 
including data on input X, output Y, and quantization error 

Q=Y-X. (3) 

III. APERTURE CODING BASED ON ERROR-MINIMIZING CODE SELEC- 
TION 

As shown in Fig. la, the aperture crossing method of Ref. 1 and 
Section II suffers from a slope-overloading problem in that rapidly 
changing inputs are hard to follow. The severity of this problem is a 
direct result of insisting that the aperture (or quantizer) characteristic 
should handle both time and amplitude information simultaneously, 
in an integrated manner. The slope-overload problem is significantly 
mitigated by a code selection procedure which is based not on aperture 
crossing per se, but on a minimization of locally averaged variance (or 
averaged magnitude) of quantization error Q. 

In the new procedure, an output code will have a slightly different 
interpretation. Thus, code P3 will imply updating at time 3, plus an 
updating amplitude equal to that of point P3 on the characteristic. 
(This is unlike that in Section II where the code P3 implied a crossing 
prior to time 3, and an updating at time 3 that was greater than the 

value of P3.) 

The code selection is now realized by computing an average quan- 
tizing error power (or magnitude) for each of the codes in the charac- 
teristic, reconstructing tentative Y-waveforms corresponding to each 

Table I — Numerical comparison of the two 
aperture coding methods in Fig. 1 
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Error- Minimizing Code-Selection Examples 
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of these codes and selecting that waveform and code that corresponds 
to the least-average quantizing error power (or magnitude). For a 7- 
code characteristic, the averaging is clearly over 3 (X, Y) pairs for 
codes Z, P3, and N3, over 2 (X, Y) pairs for P2 and N2 and over one 
(X, Y) pair, viz., (Xi, Yi), or codes PI and Nl. This code selection 
procedure is reminiscent of delayed or tree encoding. 2 ' 3 

For the X- waveform example of Fig. 1, it turns out that the average- 
error-power-minimizing code is P2, and this leads to the Y-reconstruc- 
tion shown by the dashed lines of Fig. lb. Note that this waveform 
tracks X with much less slope overload than the Y-waveform in Fig. 
la. 

Table I compares the effects of choosing codes P2 and P3 numeri- 
cally using an average error-power criterion 2?[Q 2 ]. The choice of P2 
is suggested by its lower final average power error variance (0.065 at 
time 2), as against the final average for code P3 (0.39 at time 3). 

The code selection procedure is formally defined by 



Select code C m if E[Q 2 ] | c ,„ < E[Q 2 ] 



(4) 



for all codes C p ,p^ m, with 
E[Q 2 ]\c = 



1 



'<c„ 
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where l(C p ) is the number of samples up to and including the point 
where C p appears on the characteristic. Clearly, 

Max[/(C P )] = L. 

The superiority of the error-minimizing approach has been con- 
firmed by extensive computer simulations which involved different 
input samples and two sampling frequencies, 8 and 12 kHz. Design 
parameters were separately optimized, as shown in Table II. Signal- 
to-noise comparisons are provided in Table III, where the S/N ratios 
are signal-to-quantization error variance ratios, snr is the conventional 
long-time averaged ratio expressed in decibels, while the segmental 4 

Table II — Desirable designs for two aperture coding systems with 

L = 3 and average output bit rate of 1 .2 bits/sample. Values of G 2 

and >Ao AX are appropriate for a maximum speech amplitude 

of ±32000 
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Table III — Objective performance comparisons for the two aperture 

coding methods used in conjunction with first-order adaptive 
prediction. F-subscripts refer to ratios after adaptive filtering. All S/N 

ratios are in decibels 
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Fig. 2— Waveforms of (a) input speech and quantizing error in adaptive aperture 
coding based on (b) aperture crossing and (c) eiror-minimizing code selection [error 
waveforms (b) and (c) have been magnified by a factor of 5]. 



ratio snrseg is the average value of short-term (average over, say, 16 
ms) S/N values each of which is expressed in decibels prior to the 
averaging— a procedure that better reflects the rendition of low-level 
waveform segments. 

Finally, Fig. 2 illustrates how the error waveform also tends to be 
more noise-like (less speech-correlated) if error-minimizing code selec- 
tion is employed, signifying a perceptual improvement in the aperture- 
coding process. 

IV. ADAPTIVE LOW-PASS FILTERING OF OUTPUT SPEECH 

At bit rates in the range of 9.6 to 16 kb/s use, we have found that it 
is very desirable to smooth the output of an aperture coder by some 
kind of an adaptive low-pass filter. In fact, even a sloppy low-pass filter 
characteristic such as 
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(YF) r - B.(YF)r-i + (1- B).Y r , (4) 

where YF represents a filtered version of Y, is quite effective provided 
B is appropriately adaptive. We have studied the sloppy procedure (4) 
at some length 5 because it involves a single parameter B which can be 
meaningfully related to the value of the local adjacent sample corre- 
lation C in the speech waveform, a parameter that is already available 
in first-order adaptive prediction (with time-varying coefficient h\ = 
c). 
An interesting adaptation approach is that exemplified by 

B(C) = P-C+Q (5) 

with typical (P, Q) settings of (0.4, 0.4) or (0.3, 0.3). Note that the 
basic idea is to provide the most smoothing (greatest B) for the very 
slowly varying (C— » 1) waveform segments of voiced speech. 

When C — * 1, the local bandwidth tends to be low (much less than 
half the sampling rate) and low-pass filtering of the output is clearly 
very effective for quantizing noise rejection. 

The gains due to adaptive filtering as described in (4) are illustrated 
by the objective improvements, shown by subscripts F, in Table III, 
while design principles for (4) and (5) are discussed elsewhere. Noise 
reduction with the first-order filter approach entails in general a 
concomitant loss of speech crispness, and this can be avoided if one is 
willing to employ sharper adaptive filters, a procedure that will also be 
discussed separately 6 in the context of a delta modulation coder. 
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